SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Lu Zhonghai);pers:(Zou Zhuo)"

Sökning: db:Swepub > Lu Zhonghai > Zou Zhuo

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Huan, Yuxiang, et al. (författare)
  • A 101.4 GOPS/W Reconfigurable and Scalable Control-Centric Embedded Processor for Domain-Specific Applications
  • 2016
  • Ingår i: IEEE Transactions on Circuits and Systems Part 1. - : IEEE. - 1549-8328 .- 1558-0806. ; 63:12, s. 2245-2256
  • Tidskriftsartikel (refereegranskat)abstract
    • Adapting the processor to the target application is essential in the Internet-of-Things (IoT), and thus requires customizability in order to improve energy efficiency and scalability to provide sufficient performance. In this paper, a reconfigurable and scalable control-centric architecture is proposed, and a processor consisting of two cores and an on-chip multi-mode router is implemented. Reconfigurability is enabled by a programmable sequence mapping table (SMT) which reorganizes functional units in each cycle, thus increasing hardware utilization and reducing excessive data movement for high energy efficiency. The router facilitates both wormhole and circuit switching to construct intra- or inter-chip interconnections, providing scalable performance. Fabricated in a 65-nm process, the chip exhibits 101.4 GOPS/W energy efficiency with a die size of 3.5 mm(2). The processor carries out general-purpose processing with a code size 29% smaller than the ARM Cortex M4, and improves the performance of application-specific processing by over ten times when implementing AES and RSA using SMTs instead of general-purpose C. By utilizing the on-chip router, the processor can be interconnected up to 256 nodes, with a single link bandwidth of 1.4 Gbps.
  •  
2.
  • Ma, Ning, et al. (författare)
  • A 101.4 GOPS/W Reconfigurable and Scalable Control-centric Embedded Processor for Domain-specific Applications
  • 2016
  • Ingår i: Proceedings - IEEE International Symposium on Circuits and Systems. - : IEEE. - 9781479953400 ; , s. 1746-1749
  • Konferensbidrag (refereegranskat)abstract
    • Increasing the energy efficiency and performance while providing the customizability and scalability is vital for embedded processors adapting to domain-specific applications such as Internet of Things. In this paper, we proposed a reconfigurable and scalable control-centric architecture, and implemented the design consisting of two cores and an on-chip multi-mode router in 65 nm technology. The reconfigurability is enabled by the restructurable sequence mapping table (SMT) thus the reorganizable functional units. Owing to the integration of the multi-mode router, on-chip or inter-chip network for multi-/many-core computing can be composed for performance extension on demand even in the post-fabrication stage. Control-centric design simplifies the control logic, shrinks the non-functional units and orchestrates the operations to increase the hard are utilization and reduce the excessive data movement for high energy efficiency. As a result, the processor can both conduct general-purpose processing with 29% smaller code size and application-specific processing with over 10 times performance improvement when implementing AES by SMT. The dual-core processor consumes 19.7 μW/MHz with die size of 3.5 mm2. The achieved energy efficiency is 101.4GOPS/W.
  •  
3.
  •  
4.
  • Ma, Ning, et al. (författare)
  • A Hierarchical Reconfigurable Micro-coded Multi-core Processor for IoT Applications
  • 2014
  • Ingår i: 2014 9TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC). - 9781479958108
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a micro-coded multi-core processor featuring reconfigurability and scalability with high energy efficiency for IoT domain-specific applications. By simplifying the control logic and removing the pipelines, the gate count of one core is minimized to 14 K. Meanwhile, all the hardware units are directly controlled and can be reorganized by the long microinstructions. High utilization of the hardware is thus achieved when designing the micro programs properly. Furthermore, both the ISAs for C and Java have been implemented by the micro programs to supply the general-purpose programmability. Besides, application-specific instructions can be further developed once higher performance is demanded in specific scenarios. Depending on the performance requirement, the activity and working strategies of the cores are adjustable. Moreover, several processors can be further connected to construct a network with the integrated router for even higher performance. As a case study, the AES encryption is implemented using both C and micro programs. More than 10 times of performance improvement is achieved when using micro programs on the single core, and 20 times on two cores.
  •  
5.
  • Ma, Ning, et al. (författare)
  • Design and Implementation of Multi-mode Routers for Large-scale Inter-core Networks
  • 2016
  • Ingår i: Integration. - : Elsevier. - 0167-9260 .- 1872-7522. ; 53, s. 1-13
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Constructing on-chip or inter-silicon (inter-die/inter-chip) networks to connect multiple processors extends the system capability and scalability. It is a key issue to implement a flexible router that can fit into various application scenarios. This paper proposes a multi-mode adaptable router that can support both circuit and wormhole switching with supplying flexible working strategies for specific traffic patterns in diverse applications. The limitation of mono-mode switched routers is shown at first, followed by algorithm exploration in the proposed router for choosing the proper working strategy in a specific network. We then present the performance improvement when applying the mixed circuit/wormhole switching mode to different applications, and analyze the image decoding as a case study. The multi-mode router has been implemented with different configurations in a 65 nm CMOS technology. The one with 8-bit flit width is demonstrated together with a multi-core processor to show the feasibility. Working at 350 MHz, the average power consumption of the whole system is 22 mW.
  •  
6.
  • Ma, Ning, et al. (författare)
  • Implementing MVC Decoding on Homogeneous NoCs : Circuit Switching or Wormhole Switching
  • 2015
  • Konferensbidrag (refereegranskat)abstract
    • To implement multiview video decoding on network on-chip (NoC) based homogeneous multicore architectures, the selection of switching techniques for routers is one of the most important aspects for design space exploration. Circuit switching and wormhole switching are two most feasible switching techniques for on-chip networks. To choose the suitable switching technique, we perform the comparison on decoding speed of the whole system, link utilization and delay between circuit switching and wormhole switching for implementing eight-view QVGA video decoding on 4 × 4 NoCs at 30 fps. The required link bandwidths are both around 800 Mbps with the similar network utilization and delay. We conclude that, to implement multiview video decoding on homogeneous NoCs, circuit switching is more suitable considering the similar performance and lower cost compared with wormhole switching.
  •  
7.
  • Ma, Ning (författare)
  • Ultra-low-power Design and Implementation of Application-specific Instruction-set Processors for Ubiquitous Sensing and Computing
  • 2015
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The feature size of transistors keeps shrinking with the development of technology, which enables ubiquitous sensing and computing. However, with the break down of Dennard scaling caused by the difficulties for further lowering supply voltage, the power density increases significantly. The consequence is that, for a given power budget, the energy efficiency must be improved for hardware resources to maximize the performance. Application-specific integrated circuits (ASICs) obtain high energy efficiency at the cost of low flexibility for various applications, while general-purpose processors (GPPs) gain generality at the expense of efficiency.To provide both high energy efficiency and flexibility, this dissertation explores the ultra-low-power design of application-specific instruction-set processors (ASIP) for ubiquitous sensing and computing. Two application scenarios, i.e. high-throughput compute-intensive processing for multimedia and low-throughput low-cost processing for Internet of Things (IoT) are implemented in the proposed ASIPs.Multimedia stream processing for human-computer interaction is always featured with high data throughput. To design processors for networked multimedia streams, customizing application-specific accelerators controlled by the embedded processor is exploited. By abstracting the common features from multiple coding algorithms, video decoding accelerators are implemented for networked multi-standard multimedia stream processing. Fabricated in 0.13 $\mu$m CMOS technology, the processor running at 216 MHz is capable of decoding real-time high-definition video streams with power consumption of 414 mW.When even higher throughput is required, such as in multi-view video coding applications, multiple customized processors will be connected with an on-chip network. Design problems are further studied for selecting the capability of single processors, the number of processors, the capacity of communication network, as well as the task assignment schemes.In the IoT scenario, low processing throughput but high energy efficiency and adaptability are demanded for a wide spectrum of devices. In this case, a tile processor including a multi-mode router and dual cores is proposed and implemented. The multi-mode router supports both circuit and wormhole switching to facilitate inter-silicon extension for providing on-demand performance. The control-centric dual-core architecture uses control words to directly manipulate all hardware resources. Such a mechanism avoids introducing complex control logics, and the hardware utilization is increased. Programmable control words enable reconfigurability of the processor for supporting general-purpose ISAs, application-specific instructions and dedicated implementations. The idea of reducing global data transfer also increases the energy efficiency. Finally, a single tile processor together with network of bare dies and network of packaged chips has been demonstrated as the result. The processor implemented in 65 nm low leakage CMOS technology and achieves the energy efficiency of 101.4 GOPS/W for each core.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy